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Abstract. The separating words problem asks for the size of the small- 
est DFA needed to distinguish between two words of length < n (by 
accepting one and rejecting the other). In this paper we survey what is 
known and unknown about the problem, consider some variations, and 
prove several new results. 



1 Introduction 

Imagine a computing device with very limited powers. What is the simplest 
computational problem you could ask it to solve? It is not the addition of two 
numbers, nor sorting, nor string matching — it is telling two inputs apart: dis- 
tinguishing them in some way. 

Take as our computational model the deterministic finite automaton or DFA. 
As usual, it consists of a 5-tuple, M = (Q, S, 5, q , F), where Q is a finite 
nonempty set of states, £ is a nonempty input alphabet, 5 : Q x £ — > Q is 
the transition function (assumed to be complete, or defined on all members of 
its domain), go G Q is the initial state, and F C Q is a set of final states. 

We say that a DFA M separates w and x if M accepts one but rejects the 
other. Given two distinct words w, x we let sep(w, x) be the number of states 
in the smallest DFA accepting w and rejecting x. For example, the DFA below 
separates 0010 from 1000. 




However, by a brief computation, we see that no 2-state DFA can separate 
these two words. So sep(1000, 0010) = 3. Note that sep(w,x) — sep(x,w), be- 
cause the language of a DFA can be complemented by swapping the reject and 
accept states. 

We let S(n) = max sep(u>, x). The separating words problem is to deter- 
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mine good upper and lower bounds on S(n). This problem was introduced 25 
years ago by Goralcfk and Koubek [5], who proved S(n) = o(n). It was later 
studied by Robson [718] . who obtained the best upper bound so far: S(n) = 
0(n 2 / 5 (logn) 3 / 5 ). 

As an additional motivation, the separating words problem can be viewed 
as an inverse of a classical problem from the early days of automata theory: 
given two DFAs accepting different languages, what length of word suffices to 
distinguish them? More precisely, given two DFAs Mi and M2, with m and 
n states, respectively, with L{M\) ^ L(M2), what is a good bound on the 
length of the shortest word accepted by one but not the other? The usual cross- 
product construction quickly gives an upper bound of mn — 1 (make a DFA for 
L(Mi) n L(M2))- But the optimal upper bound of m + n — 2 follows from 
the usual algorithm for minimizing automata. Furthermore, this bound is best 
possible [HI Thm. 3.10.6]. For NFAs the bound is exponential in m and n [BJ. 

From the following result, already proved by Goralcik and Koubek [5J, we 
know that the hard case of word separation comes from words of equal length: 

Proposition 1. Suppose \w\, \x\ < n and \w\ 7^ \x\. Then sep(u>,ir) = O(logn). 
Furthermore, there is an infinite class of examples where sep(w, x) — J?(logn). 

We use the following lemma [TP] : 

Lemma 1. If < i, j < n and i 7^ j, then there is a prime p < 4.4 log n such 
that i ^ j (mod p) . 

Proof, (of Proposition [lj If \w\ 7^ |sc|, then by Lemma [1] there exists a prime 
p < 4.41ogn such that \w\ modp ^ \x\ mod p. Hence a simple cycle of p states 
serves to distinguish w from x. 

On the other hand, no DFA with n states can distinguish 

0"" 1 from "~ 1+lcm(1 < 2 '-' n) . 

To see this, let pi = 5(qo,0 1 ) for i > 0. Then pi is ultimately periodic with 
period < n and preperiod at most n — 1. Thus p n -i — p n -i+icm(i. 2, ...,«.)■ Since 
lcm(l, 2, . . . , n) = e"^ 1 ^ 1 )) by the prime number theorem, the 4?(logn) lower 
bound follows. □ 

As an example, suppose \w\ — 22 and \x\ — 52. Then \w\ = 1 (mod 7) and 
|x| = 3 (mod 7). So we can accept w and reject x with a DFA that uses a cycle 
of size 7, as follows: 




In what follows, then, we only consider the case of equal-length words, and 
we redefine S(n) = max sep(w, x). The goal of the paper is to survey what is 

M = |a>|=n 

known and unknown, and to examine some variations on the original problem. 
Our main new results are Theorems [2] and [3] 



2 Independence of alphabet size 

As we have defined it, S(n) could conceivably depend on the size of the alphabet 
E. Let Sk(n) be the maximum number of states needed to separate two length- n 
words over an alphabet of size fc. Then we might have a different value Sk(n) 
depending on k = \E\. The following result shows this is not the case for k > 2. 
This result was stated in [5] without proof; we supply a proof here. 

Proposition 2. For all k > 2 we have Sk(n) = S2(n). 

Proof. Suppose x, y are distinct length-n words over an alphabet £ of size k > 2. 
Then x and y must differ in some position, say for a ^= fe, 

V = V by , 

for \x'\ = \y'\. 

Now map a to 0, 6 to 1 and map all other letters of £ to 0. This gives two 
new distinct binary words X and Y of length n. If X and Y can be separated 
by an m-state DFA, then so can x and y, by renaming transitions of the DFA 
to be over S\b and {b} instead of and 1, respectively. Thus Sk(n) < 5*2 (n). 
But clearly 5*2 (n) < Sk{n), since every binary word can be considered as a word 
over the larger alphabet 2J. So Sk(n) = S2 (n). □ 



3 Average case 



One frustrating aspect of the separating words problem is that nearly all pairs of 
words can be easily separated. This means that bad examples cannot be easily 
produced by random search. 

Proposition 3. Consider a pair of words (w, x) selected uniformly from the set 
of all pairs of unequal words of length n over an alphabet of size k. Then the 
expected number of states needed to separate w from x is 0(1). 

Proof. With probability 1— 1/fc, two randomly-chosen words will differ in the first 
position, which can be detected by an automaton with 3 states. With probability 
(l/fc)(l — 1/fc) the words will agree in the first position, but differ in the second, 
etc. Hence the expected number of states needed to distinguish two randomly- 
chosen words is bounded by £ i>1 (i + 2)(l/fc) i - 1 (l-l/fc) = (3fc-2)/(fc-l) < 4. 
□ 

4 Lower bounds for words of equal length 

First of all, there is a lower bound analogous to that in Proposition [1] for words 
of equal length. This does not appear to have been known previously. 

Theorem 1. No DFA of at most n states can separate the equal-length binary 

words W = O n - 1 l n - 1 + lcm (1.2,-,n) and x = Q n-l+lcm(l,2,...,n) 1 n-l_ 

Proof. In pictures, we have 




More formally, let M be any DFA with n states, let q be any state, and let 
a be any letter. Let pi = S(q, a 1 ) for i > 0. Then pi is ultimately periodic with 
period < n and preperiod ("tail") at most n— 1. Thus p„_i = p n -i+i C m(i,2,...,n)- 

It follows that after processing O"" 1 and o n_1+Icm ( 1 ' 2 '— '"^ M must be in the 
same state. Similarly, after processing o n - 1 l n - 1+lcm ( 1 ' 2 '-' n ) and Q«-i+lcm(i,2,...,n) 1 
M must be in the same state. So no n-state machine can separate w from x. □ 

We now prove a series of very simple results showing that if w and x differ 
in some "easy-to-detect" way, then sep(w, x) is small. 



4.1 Differences Near the Beginning or End of Words 

Proposition 4. Suppose w and x are words that differ in some symbol that 
occurs d positions from the start. Then sep(w, x) < d + 2. 

Proof. Let t be a prefix of length d of w. Then t is not a prefix of x. We can 
accept the language tS* using d + 2 states; such an automaton accepts w and 
rejects x. □ 

For example, to separate 

01010011101100110000 

from 

01001111101011100101 
we can build a DFA to recognize words that begin with 0101: 




(Transitions to a dead state are omitted.) 

Proposition 5. Suppose w and x differ in some symbol that occurs d positions 
from the end. Then sep(w,x) < d+ 1. 

Proof. Let the DFA M be the usual pattern-recognizing automaton for the 
lcngth-d suffix s of w, ending in an accepting state if the suffix is recognized. 
Then M accepts w but rejects x. States of M correspond to prefixes of s, and 
5(t, a) = the longest suffix of ta that is a prefix of s. □ 

For example, to separate 

11111010011001010101 

from 

11111011010010101101 



we can build a DFA to recognize those words that end in 0101: 



4.2 Fingerprints 

Define \w\ a as the number of occurrences of the symbol a in the word w. 

Proposition 6. 7/ |zy| , |x| < n and \w\ a ^ \x\ a for some symbol a, then sep(u;, x) — 
O(logn). 

Proof. By the prime number theorem, if \w\, \x\ — n, and w and x have k and m 
occurrences of a respectively (fc ^ to), then there is a prime p = O(logn) such 
that k ^ m (mod p). So we can separate w from x just by counting the number 
of a's, modulo p. □ 

Analogously, we have the following result. 

Proposition 7. If there is a pattern of length d occurring a differing number of 
times in w and x, with \w\, \x\ < n, then sep(w, x) = O(dlogn). 

4.3 Pairs with Low Hamming Distance 

The previous results have shown that if w and x have differing "fingerprints" , 
then they are easy to separate. By contrast, the next result shows that if w and 
x are very similar, then they are also easy to separate. 

The Hamming distance H(w,x) between two equal-length words w and x is 
defined to be the number of positions where they differ. 

Theorem 2. Letw andx be words of length n. IfH(w,x) < d, thensei>(w,x) = 
0(d log n). 

Proof. Without loss of generality, assume x and y are binary words, and x has 
a 1 in some position where y has a 0. Consider the following picture: 



x = 
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Let i\ < *2 < • ■ • < id be the positions where x and y differ. Now consider 
N = — *i)(*3 — ^l) ■ ■ • {id~ii)- Then N < n^ 1 . By the prime number theorem, 
there exists some prime p = 0(log N) = O(dlogn) such that N is not divisible 
by p. So ij ^ ii (mod p) for 2 < j < d. 



Define a Pt k{x) = x j m °d 2. This value can be calculated by 

\j=k (mod p) J 

a DFA consisting of two connected rings of p states each. We use such a DFA 
calculating a p ^ 1 . Since p is not a factor of N, none of the positions 12, . . . , id 
are included in the count a p .i ± , and the two words x and y agree in all other 
positions. So x contains exactly one more 1 in these positions than y does, and 
hence we can separate the two words using O(dlogn) states. □ 

5 Special classes of words 
5.1 Reversals 

It is natural to think that pairs of words that are related might be easier to 
separate than arbitrary words; for example, it might be easy to separate a word 
from its reversal. No better upper bound is known for this special case. However, 
we still have a lower bound of fi(log n) for this restricted problem: 

Proposition 8. There exists a class of words w for which sep(w, w R ) = f2 (log n) 
where n = \w\. 

Proof. Consider separating 

w = o^ilO* -1 " 1 " 10111 ^' 2 '-*) 

from 

W R _ Qt-l+lcm(l,2,...t) 10 t-l_ 

Then, as before, no DFA with < t states can separate w from w R . □ 

Must sep(w R 7 x R ) — sep(ui,x)? No, for w = 1000, x = 0010, we have 

sep(ui, x) = 3 

but 

sev{w R ,x R ) = 2. 
Open Problem 1 Is |sep(a;, w) — sep(x R ,w R )\ unbounded? 



5.2 Conjugates 

Two words w, w' are conjugates if one is a cyclic shift of the other. For example, 
the English words enlist and listen are conjugates. Is the separating words 
problem any easier if restricted to pairs of conjugates? 

Proposition 9. There exist a infinite class of pairs of words w,x such that w,x 
are conjugates, and sep(w,x) = i?(logn) for \w\ — \x\ = n. 



Proof. Consider again 



u; = t-l 10 t-l+lcm(l,2,...t) 1 



and 



w = 




□ 



6 Nondeterministic separation 

We can define nsep(w, x) in analogy with sep: the number of states in the smallest 
NFA accepting w but rejecting x. There do not seem to be any published results 
about this measure. 

Now there is an asymmetry in the inputs: nsep(ui, x) need not equal nsep(x, w). 
For example, the following 2-state NFA accepts w — 000100 and rejects x = 
010000, so nsep(w,x) < 2. 



However, an easy computation shows that there is no 2-state NFA accepting 
x and rejecting u;, so nsep(x, uu) > 3. 

Open Problem 2 Is |nscp(.x,w) — nsep(u>, x)| unbounded? 

A natural question is whether NFAs give more separation power than DFAs. 
Indeed they do, since sep(0001, 0111) = 3 but nsep(0001, 0111) = 2. However, 
a more interesting question is the extent to which nondeterminism helps with 
separation — for example, whether it contributes only a constant factor or there 
is any asymptotic improvement in the number of states required. 

Theorem 3. The quantity sep(u>, a;)/nsep(w, x) is unbounded. 

Proof. Consider once again the words 



where t = n 2 — 3n + 2, n > 4. 

We know from Theorem Q] that any DFA separating these words must have 
at least t + 1 — n 2 — 3n + 3 states. 



0, 1 








_ t-l+lcm(l,2,...,t)-|t-l and x _ Qt-ljt-l+lcmCl^,...,*) 



Now consider the following NFA M: 




The language accepted by this NFA is {0° : a € ^4}1*, where A is the set 
of all integers representable by a non-negative integer linear combination of n 
and n — 1. But t — 1 = n 2 — 3n + 1 ^ A, as can be seen by computing t — 1 
modulo n — 1 and modulo n. On the other hand, every integer > t is in A. Hence 
w = o*-i+Wi.2,-..,t)i*-i i s accepted by M but a; = Q*-ii*-i+km(i,2,...,t ) is not 

Now M has 2n = 0{\fi) states, so sep(x, w)/nsep(a;, w) > \fi = Q(^\og \x\), 
which is unbounded. □ 



Open Problem 3 Find better bounds on sep(w,a;)/nsep(w,x). 

We can also get an f2(logn) lower bound for nondeterministic separation. 
Theorem 4. No NFA of n states can separate 

Qn 2 -l-pi 2 -l+lcm(l,2,...,n) 

from 

Qn 2 -l+lcm(l,2,...,n)^n 2 -l 

Proof. A result of Chrobak [I] , as corrected by To [H] , states that every unary 
n-state NFA is equivalent to one consisting of a "tail" of at most 0(n 2 ) states, 
followed by a single nondeterministic state that leads to a set of cycles, each of 
which has at most n states. The size of the tail was proved to be at most n 2 — 2 
by Geffert [3]. 

Now we use the same argument as for DFAs above. □ 



Open Problem 4 Find better bounds on nsep(w,x) for \w\ = \x\ = n, as a 
function of n. 

Theorem 5. We have nsep(w,x) — nsep(w R ,x R ). 

Proof. Let M be an NFA with the smallest number of states accepting w and 
rejecting x. Now make a new NFA M' with initial state equal to any one element 
of S(qo,w) and final state go, and all other transitions of M reversed. Then M' 
accepts w R . But M' rejects x R . For if M' accepted x R then M would also accept 
x, since the input string and transitions are reversed. □ 



7 Separation by 2DPDA's 

In [2], the authors showed that words can be separated with small context-free 
grammars (and hence small PDA's). In this section we observe 

Proposition 10. Two distinct words of length n can be separated by a 2DPDA 
of size O(logn). 

Proof. Recall that a 2DPDA is a deterministic pushdown automaton, with end- 
markers surrounding the input, and two-way access to the input tape. Given dis- 
tinct strings w, x of length n, they must differ in some position p with 1 < p < n. 
Using O(logp) states, we can reach position p on the input tape and accept if 
(say) the corresponding character equals w[p], and reject otherwise. 

Here is how to access position p of the input. We show how to go from 
scanning position i to position 2i using a constant number of states: we move 
left on the input, pushing two symbols per move on the stack, until the left 
endmarker is reached. Now we move right, popping one symbol per move, until 
the initial stack symbol is reached. Using this as a subroutine, and applying it 
to the binary expansion of p, we can, using O(logp) states, reach position p of 
the input. □ 



8 Permutation automata 

We conclude by relating the separating words problem to a natural problem of 
algebra. 

Instead of arbitrary automata, we could restrict our attention to automata 
where each letter induces a permutation of the states ( "permutation automata" ) , 
as suggested by Robson [8]. He obtained an 0{n x / 2 ) upper bound in this case. 

For an n-state automaton, the action of each letter can be viewed as an 
element of S n , the symmetric group on n elements. 

Turning the problem around, then, we could ask: what is the shortest pair 
of distinct equal-length binary words w, x, such that for all morphisms a : 
{0, 1}* — > S n we have a(w) = a{x)l Although one might suspect that the answer 



is lcm(l, 2, . . . , n), for n = 4, there is a shorter pair (of length 11): 00000011011 
and 11011000000. 

Now if a(w) — o~{x) for all a, then (if we define o~(x~ l ) = a(x)^ 1 ) we have 
that a(wx~ 1 ) = the identity permutation for all a. 

Call any nonempty word y over the letters 0, 1, , 1 _1 an identical relation 
if cr(y) = the identity for all morphisms a. We say y is nontrivial if y contains 
no occurrences of 00 _1 and 11 — 1 . 

What is the length t of the shortest nontrivial identical relation over S n ? 
Recently Gimadeev and Vyalyi @] proved I = 2°(v^i°g"). 
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